从零开始构建 Web 应用(一)
这是系列文章的第一篇,我将介绍如何使用 Python 从零开始构建一个 Web 应用程序(及其 Web 服务器)。在本系列中,我们将仅依赖 Python 标准库,并忽略 WSGI 标准。
话不多说,让我们开始吧。
Web 服务器
首先,我们将编写一个 HTTP 服务器来驱动我们的 Web 应用。但在此之前,我们需要花点时间研究一下 HTTP 协议是如何工作的。
HTTP 的工作原理
简单来说,HTTP 客户端通过网络连接到 HTTP 服务器,并发送代表请求的数据字符串。服务器随后解释该请求,并向客户端发送响应。整个协议以及这些请求和响应的格式在 RFC2616 中都有描述,但我将在此非正式地描述它们,以免你需要阅读整个文档。
请求格式
请求由一系列以 \r\n
分隔的行表示,其中第一行称为"请求行"。请求行由 HTTP 方法、空格、被请求的文件路径、另一个空格、客户端使用的 HTTP 协议版本,最后以回车符( \r
)和换行符( \n
)结束:
GET /some-path HTTP/1.1\r\n
请求行之后是零个或多个头部行。每个头部行由头部名称、冒号、可选值,以及 \r\n
:组成:
Host: example.com\r\n
Accept: text/html\r\n
头部部分的结束由一个空行表示:
\r\n
最后,请求可能包含一个"body"——即随请求发送给服务器的任意负载。
将所有内容整合在一起,这是一个简单的 GET
请求:
GET / HTTP/1.1\r\n
Host: example.com\r\n
Accept: text/html\r\n
\r\n
这是一个带有 body 的简单 POST
请求:
POST / HTTP/1.1\r\n
Host: example.com\r\n
Accept: application/json\r\n
Content-type: application/json\r\n
Content-length: 2\r\n
\r\n
{}
响应格式
响应与请求类似,都是由一系列以 \r\n
分隔的行组成。响应的第一行称为"状态行",它由 HTTP 协议版本、一个空格、响应状态码、另一个空格、状态码原因,然后以 \r\n
结尾:
HTTP/1.1 200 OK\r\n
状态行之后是响应头,接着是一个空行,然后是可选的响应体:
HTTP/1.1 200 OK\r\n
Content-type: text/html\r\n
Content-length: 15\r\n
\r\n
<h1>Hello!</h1>
一个简单的服务器
根据我们目前对协议的了解,让我们编写一个服务器,无论接收到的请求如何,都发送相同的响应。
首先,我们需要创建一个套接字,将其绑定到地址,然后开始监听连接。
import socket
HOST = "127.0.0.1"
PORT = 9000
# By default, socket.socket creates TCP sockets.
with socket.socket() as server_sock:
# This tells the kernel to reuse sockets that are in `TIME_WAIT` state.
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
# This tells the socket what address to bind to.
server_sock.bind((HOST, PORT))
# 0 is the number of pending connections the socket may have before
# new connections are refused. Since this server is going to process
# one connection at a time, we want to refuse any additional connections.
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
如果你现在尝试运行这段代码,它会打印到标准输出,说明它在监听 127.0.0.1:9000
,然后退出。为了实际处理传入的连接,我们需要在套接字上调用 accept
方法。这样做将使进程阻塞,直到客户端连接到我们的服务器。
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
client_sock, client_addr = server_sock.accept()
print(f"New connection from {client_addr}.")
一旦我们与客户端建立了套接字连接,我们就可以开始与其通信。使用 sendall
方法,让我们向连接的客户端发送一个示例响应:
RESPONSE = b"""\
HTTP/1.1 200 OK
Content-type: text/html
Content-length: 15
<h1>Hello!</h1>""".replace(b"\n", b"\r\n")
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
client_sock, client_addr = server_sock.accept()
print(f"New connection from {client_addr}.")
with client_sock:
client_sock.sendall(RESPONSE)
如果你现在运行代码,然后在你最喜欢的浏览器中访问 http://127.0.0.1:9000
,它应该会渲染字符串"Hello!"。不幸的是,服务器在发送响应后会退出,所以刷新页面会失败。让我们来修复这个问题:
RESPONSE = b"""\
HTTP/1.1 200 OK
Content-type: text/html
Content-length: 15
<h1>Hello!</h1>""".replace(b"\n", b"\r\n")
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"New connection from {client_addr}.")
with client_sock:
client_sock.sendall(RESPONSE)
此时,我们已经有了一个能够对每个请求都返回一个简单 HTML 页面的 Web 服务器,全部代码大约 25 行。这还不错!
一个文件服务器
让我们扩展 HTTP 服务器,使其能够从磁盘提供文件。
请求抽象
在这样做之前,我们必须能够从客户端读取和解析传入的请求数据。由于我们知道请求数据是由一系列以 \r\n
字符分隔的行组成的,让我们编写一个生成器函数,该函数从套接字读取数据并生成每一行:
import typing
def iter_lines(sock: socket.socket, bufsize: int = 16_384) -> typing.Generator[bytes, None, bytes]:
"""Given a socket, read all the individual CRLF-separated lines
and yield each one until an empty one is found. Returns the
remainder after the empty line.
"""
buff = b""
while True:
data = sock.recv(bufsize)
if not data:
return b""
buff += data
while True:
try:
i = buff.index(b"\r\n")
line, buff = buff[:i], buff[i + 2:]
if not line:
return buff
yield line
except IndexError:
break
这可能看起来有点令人望而生畏,但它本质上做的事情是尽可能多地从套接字中读取数据(以 bufsize
块的形式),将这些数据连接到一个缓冲区( buff
)中,并不断地将缓冲区分割成单独的行,一次生成一行。一旦它找到一个空行,它就会返回它读取的额外数据。
使用 iter_lines
,我们可以开始打印我们从客户端收到的请求:
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"New connection from {client_addr}.")
with client_sock:
for request_line in iter_lines(client_sock):
print(request_line)
client_sock.sendall(RESPONSE)
如果你现在运行服务器并访问 http://127.0.0.1:9000
,你应该会在控制台看到类似这样的内容:
Received connection from ('127.0.0.1', 62086)...
b'GET / HTTP/1.1'
b'Host: localhost:9000'
b'Connection: keep-alive'
b'Cache-Control: max-age=0'
b'Upgrade-Insecure-Requests: 1'
b'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36'
b'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'
b'Accept-Encoding: gzip, deflate, br'
b'Accept-Language: en-US,en;q=0.9,ro;q=0.8'
真不错!让我们通过定义一个 Request
类来抽象这些数据:
import typing
class Request(typing.NamedTuple):
method: str
path: str
headers: typing.Mapping[str, str]
目前,请求类只关心方法、路径和请求头。我们将解析查询字符串参数和读取请求体的工作留到以后。
为了封装构建请求所需的逻辑,我们将在 Request
中添加一个类方法 from_socket
:
class Request(typing.NamedTuple):
method: str
path: str
headers: typing.Mapping[str, str]
@classmethod
def from_socket(cls, sock: socket.socket) -> "Request":
"""Read and parse the request from a socket object.
Raises:
ValueError: When the request cannot be parsed.
"""
lines = iter_lines(sock)
try:
request_line = next(lines).decode("ascii")
except StopIteration:
raise ValueError("Request line missing.")
try:
method, path, _ = request_line.split(" ")
except ValueError:
raise ValueError(f"Malformed request line {request_line!r}.")
headers = {}
for line in lines:
try:
name, _, value = line.decode("ascii").partition(":")
headers[name.lower()] = value.lstrip()
except ValueError:
raise ValueError(f"Malformed header line {line!r}.")
return cls(method=method.upper(), path=path, headers=headers)
它使用我们之前定义的 iter_lines
函数来读取请求行。在那里它获取 method
和 path
,然后读取每一行单独的头部信息并解析这些信息。最后,它构建 Request
对象并返回。如果我们把这个集成到我们的服务器循环中,它应该看起来像这样:
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"Received connection from {client_addr}...")
with client_sock:
request = Request.from_socket(client_sock)
print(request)
client_sock.sendall(RESPONSE)
如果你现在连接到服务器,你应该会看到类似下面这行的输出:
Request(method='GET', path='/', headers={'host': 'localhost:9000', 'user-agent': 'curl/7.54.0', 'accept': '*/*'})
因为 from_socket
在某些情况下可能会引发异常,如果现在给服务器发送一个无效请求,服务器可能会崩溃。为了模拟这种情况,你可以使用 telnet 连接到服务器并发送一些虚假数据:
~> telnet 127.0.0.1 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hello
Connection closed by foreign host.
果然,服务器崩溃了:
Received connection from ('127.0.0.1', 62404)...
Traceback (most recent call last):
File "server.py", line 53, in parse
request_line = next(lines).decode("ascii")
ValueError: not enough values to unpack (expected 3, got 1)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "server.py", line 82, in <module>
with client_sock:
File "server.py", line 55, in parse
raise ValueError("Request line missing.")
ValueError: Malformed request line 'hello'.
为了更优雅地处理这类问题,让我们将 from_socket
的调用包裹在一个 try-except 块中,并在接收到格式错误的请求时向客户端发送"400 Bad Request"响应:
BAD_REQUEST_RESPONSE = b"""\
HTTP/1.1 400 Bad Request
Content-type: text/plain
Content-length: 11
Bad Request""".replace(b"\n", b"\r\n")
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"Received connection from {client_addr}...")
with client_sock:
try:
request = Request.from_socket(client_sock)
print(request)
client_sock.sendall(RESPONSE)
except Exception as e:
print(f"Failed to parse request: {e}")
client_sock.sendall(BAD_REQUEST_RESPONSE)
如果我们现在尝试破坏它,客户端将收到一个响应,而服务器将保持运行:
~> telnet 127.0.0.1 9000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hello
HTTP/1.1 400 Bad Request
Content-type: text/plain
Content-length: 11
Bad RequestConnection closed by foreign host.
此时我们准备好开始实现文件服务部分,但首先让我们将默认响应设置为"404 Not Found"响应:
NOT_FOUND_RESPONSE = b"""\
HTTP/1.1 404 Not Found
Content-type: text/plain
Content-length: 9
Not Found""".replace(b"\n", b"\r\n")
#...
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"Received connection from {client_addr}...")
with client_sock:
try:
request = Request.from_socket(client_sock)
print(request)
client_sock.sendall(NOT_FOUND_RESPONSE)
except Exception as e:
print(f"Failed to parse request: {e}")
client_sock.sendall(BAD_REQUEST_RESPONSE)
此外,让我们添加一个"405 Method Not Allowed"响应。当我们接收到非 GET
请求时,我们将需要它。
METHOD_NOT_ALLOWED_RESPONSE = b"""\
HTTP/1.1 405 Method Not Allowed
Content-type: text/plain
Content-length: 17
Method Not Allowed""".replace(b"\n", b"\r\n")
让我们定义一个 SERVER_ROOT
常量来表示服务器应从哪里提供文件,以及一个 serve_file
函数。
import mimetypes
import os
import socket
import typing
SERVER_ROOT = os.path.abspath("www")
FILE_RESPONSE_TEMPLATE = """\
HTTP/1.1 200 OK
Content-type: {content_type}
Content-length: {content_length}
""".replace("\n", "\r\n")
def serve_file(sock: socket.socket, path: str) -> None:
"""Given a socket and the relative path to a file (relative to
SERVER_SOCK), send that file to the socket if it exists. If the
file doesn't exist, send a "404 Not Found" response.
"""
if path == "/":
path = "/index.html"
abspath = os.path.normpath(os.path.join(SERVER_ROOT, path.lstrip("/")))
if not abspath.startswith(SERVER_ROOT):
sock.sendall(NOT_FOUND_RESPONSE)
return
try:
with open(abspath, "rb") as f:
stat = os.fstat(f.fileno())
content_type, encoding = mimetypes.guess_type(abspath)
if content_type is None:
content_type = "application/octet-stream"
if encoding is not None:
content_type += f"; charset={encoding}"
response_headers = FILE_RESPONSE_TEMPLATE.format(
content_type=content_type,
content_length=stat.st_size,
).encode("ascii")
sock.sendall(response_headers)
sock.sendfile(f)
except FileNotFoundError:
sock.sendall(NOT_FOUND_RESPONSE)
return
serve_file
接收客户端套接字和文件路径。然后尝试解析该路径以在 SERVER_ROOT
中找到实际文件,如果文件解析在服务器根目录之外,则返回"未找到"响应。接着尝试打开文件并确定其 MIME 类型和大小(使用 os.fstat
),然后构建响应头并使用 sendfile
系统调用将文件写入套接字。如果磁盘上找不到文件,则发送"未找到"响应。
如果我们添加 serve_file
,我们的服务器循环现在应该看起来像这样:
如果我们添加 serve_file ,我们的服务器循环现在应该看起来像这样:
with socket.socket() as server_sock:
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((HOST, PORT))
server_sock.listen(0)
print(f"Listening on {HOST}:{PORT}...")
while True:
client_sock, client_addr = server_sock.accept()
print(f"Received connection from {client_addr}...")
with client_sock:
try:
request = Request.from_socket(client_sock)
if request.method != "GET":
client_sock.sendall(METHOD_NOT_ALLOWED_RESPONSE)
continue
serve_file(client_sock, request.path)
except Exception as e:
print(f"Failed to parse request: {e}")
client_sock.sendall(BAD_REQUEST_RESPONSE)
如果你在 server.py
文件旁边添加一个名为 www/index.html
的文件,并访问 http://localhost:9000
,你应该能看到那个文件的内容。酷吧?
收尾
第一部分就到这里。在第二部分,我们将涵盖提取 Server
和 Response
抽象以及让服务器处理多个并发连接的内容。如果你想查看完整源代码并跟随学习,可以在这里找到。
下次再见!